For a manual in english, look down!
Eesti keelne / In Estonian:
PowerMul versioonis on asendatud arvude nihutamine
ja liitmine ksuga mul(korrutamine). "Mul" vtab
aega 3-9 tsklit olenevalt protsessorist, aga see
osutus kasulikuks paljudel juhtudel. Kui on tegemist
arvudega nagu 1024^1024, kus ainult 1 bitt 10-st on
psti, siis nihutamine kaldub olema palju kiirem,
aga keskmiselt on "mul" umbes 10 korda kiirem
suvaliste arvudega. Kui ma leian vimaluse teha mitu
korrutamist hel ringil, saan ma kasutada selle ksu
konveieri eeliseid, aga seniks vivad nihutamised
mningail juhtudel la korrutamist.

Tegelikult ei ole probleem arvutustes, need toime-
tatakse murdosa sekundi jooksul, aga suurim mure on
konversioon, sest seda tehakse nihutades binaararve
ja peale vrdlemist 1-ga, otsustatakse kas liita
lplikule kmnendtulemusele vi mitte.
See vtab umbes 350hex/s (heksadetsimaalnumbrit
sekundis) 600MHz P!!! peal.

Viimane versioon - teisendused on viidud jagamise
peale, mis osutus oluliselt kiiremaks. Hetkel on see
kiiruselt teisel kohal, aga kige rohkem pidurdab
MessageBox'i kuvamine.

Nautige! :)

****************************************************

In English / Inglise keelne
PowerMul version has implemented a "mul" instruction
instead of many shifts and adds. The "mul" itself
takes about 3-9 clocks depending on CPU, but it
proved to be beneficial in most cases. When dealing
with numbers like 1024^1024 where only 1 bit from
10 is set then shifting tends to be much faster,
but on average "mul" is about 10 times faster with
random numbers. If I'm able to figure out how to do
multiple "mul"'s in one cycle I could take advantage
of that pipelined instruction, but until then shifts
beat mul in some cases.

In fact the problem doesn't lie in calculations,
they are done in a fraction of a second, but the
biggest concern is the conversion because it is done
by shifting binary digits and after comparing with 1
deciding wheather to add something to final decimal
result or not.
This takes about 350hex/s (hexadecimal numbers per
second) on a 600MHz P!!!

In the latest version the conversions are made with
divisions, what appears to be much faster. At this
moment, this holds the second place in speed and the
baddest lagger is the displaying of MessageBox.

Enjoy! :)